On Estimating Variances for Topic Set Size Design

نویسندگان

  • Tetsuya Sakai
  • Lifeng Shang
چکیده

Topic set size design is a suite of statistical techniques for determining the appropriate number of topics when constructing a new test collection. One vital input required for these techniques is an estimate of the population variance of a given evaluation measure, which in turn requires a topic-by-run score matrix. Hence, to build a new test collection, a pilot data set is a prerequisite. Recently, we ran an IR task at NTCIR-12 where the number of topics was actually determined using topic set size design with an initial pilot data set based on only five similar runs; a test collection was then constructed accordingly by pooling 44 runs from 16 participating teams for 100 topics. In this study, we treat the new test collection with the associated runs as a more reliable pilot data set to investigate how many teams and topics are actually necessary in the pilot data for obtaining accurate variance estimates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Score Standardisation on Topic Set Size Design

Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic ...

متن کامل

The Axis of Risk and Uncertainty in Hydrologic Design

The uncertainty of the hydrologic risk of flood related hydraulic structures is determined by estimating the variance of the risk of failure based on the methods of moments (MOM), probability weighted moments (PWM), and maximum likelihood (ML) assuming that the underlying model is the Gumbel distribution. The derived variances of the risk based on the three estimation methods are functions of s...

متن کامل

Estimating most productive scale size in DEA with real and integer value data

For better guiding a system, senior managers should have accurate information. Using Data Envelopment analysis (DEA) help managers in this objective. Thus, many investigations have been made in order to find the most productive scale size (MPSS) for the evaluating decision making units (DMUs). In this paper we consider this case where there exist subsets of input and output variables to be inte...

متن کامل

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes().

We consider the problem of estimating the correlation in bivariate normal data when the means and variances are assumed known, with emphasis on the small sample case. We consider eight different estimators, several of them considered here for the first time in the literature. In a simulation study, we found that Bayesian estimators using the uniform and arc-sine priors outperformed several empi...

متن کامل

Bayesian Modeling of Heterogeneous Error and Genotype 3 Environment Interaction Variances

An important assumption in the analysis of multienvironment cultivar trials is homogeneity of error and genotype 3 environment interaction variances. When variances are heterogeneous, the best estimators of performance are obtained by weighting inversely to variance components. However, because variances are almost never known and must be estimated, the additional error introduced into the mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016